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Abstract 

The  possibility  of  aligning  the  dual  goals  of  an  optimal  stochastic  controller  is 
discussed.  It  is  suggested  that  when  the  measurement  function  is  chosen  so  that 
these  two  dual  goals  are  aligned  an  artificial  separation  will  occur.  This  will 
occur  since  the  action  taken  to  follow  a trajectory  will  also  lead  to  the  best  pos- 
sible use  of  the  control  for  estimation  purposes.  A simple  set  of  examples  de- 
scribing the  nature  of  such  alignment  and  cases  of  nonalignment  is  given. 


. 1.  INTRODUCTION 

i 

'Inherent  in  the  problem  of  stabilization  and  con- 
trol of  most  dynamic  systems,  is  the  problem  of 
processing  noise  contaminated  measurement  data 
to  obtain  accurate  information  about  the  state  of 
the  generally  nonlinear  stochastic  system.  If  the 
'State  can  be  accurately  estimated,  then  classical 
deterministic  control  techniques,  or  an  approxi- 
mate linearized  quadratic  gaussian  approach,  can 
often  be  used  to  give  adequate  system  perfor- 
mance. The  classical  deterministic  controller  is 
often  of  the  form  of  a feedback  control  scheme!^ 


As  pointed  out  by  Feldbaum,  ^ the  optimal  sto- 
chastic controller  for  nonlinear  stochastic  sys- 
tems can  often  be  thought  of  as  having  two  (possi- 
bly conflicting)  goals.  The  first  goal  is  to  drive 
the  "true"  state  of  the  system  over  or  near  a de- 
sired trajectory  in  state  space.  The  second  goal 

is  to  obtain  the  most  accurate  information  about 

* 

the  value  of  the  "true"  state. 


The  desired  trajectory  is  usually  specified  in 
terms  of  a cost  functional.  One  measure  of  the 
cost  of  a trajectory  is  its  deviation  from  the  de- 
sired trajectory.  This  cost  is  added  to  a cost  for 
the  control  action  required  to  traverse  the  trajec- 
tory. The  "optimal"  trajectory  is  the  one  which 
minimizes  the  combined  cost  found  by  adding  these 
two. 

I 

2.  TWO  GOALS 

If  there  are  no  uncertain  system  parameters,  and 
no  noise  driving  the  dynamic  system  or  corrupting 
the  measurements,  then,  the  optimal  feedback  con- 
trol at  time  t^  is  a function  of  the  state  at  time  t^. 
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k * 1,  2,  ...N  (2) 


The  cost  is  a functional  of  the  control  policy 


IHi' 


u 


2’ 


U 


N‘. 


V 


I • . 


* 


WESTERN!  P 

ft  •«  . i ; J •'  . 4 • :«  H P 

j '5  C->-  5 • ‘HJ-1C7 / 


RIOOJCAt-S 

A " A I*  '5 


:C. 


C s j[*[>  x2'  ***XN’ 

U1(V'  U*(*2>-  --W1  (3) 

«jid  the  optimal  control  policy  is  the  set  of  func- 
tions Uj(  ),  U2(  )f  . . . UN(  ) which  minimise  the 
cost. 

However  if  there  are  uncertain  system  parame- 
ters, noise  driving  the  dynamic  system  and/or 
noise  contaminating  the  measurements,  the  "true" 
system  state  is  a random  variable  as  is  the  cost. 

It  is  generally  impossible  to  know  its  present  val- 
ue or  to  predict  its  future  value  with  certainty. 

Our  knowledges  of  the  state  of  the  system  at  the 
» 

present  time  Ss  obtained  by  filtering  the  measure- 
ment data  in  order  to  obtain  the  best  possible  es- 
timate for  state  at  time  k based  on  all  data  up  to 
time  V^/i^ 

It  should  be  clear  that  more  accurate  estimates  of 
the  present  state,  and  prediction  of  the  future 
state  of  the  system,  will  allow  better  control  of 
that  state.  In  general  the  control  policy  can  affect 
the  accuracy  of  the  estimates  and  predictions. 

This  use  of  the  control  policy  to  improve  state  es- 
timation and  prediction  is  the  second  "dual"  goal 

of  the  control  policy. 

I 

A control  policy  that  has  seen  much  use  is  simply 
the  use  of  the  best  estimate  for  the  system  state 
in  place  of  the  "true"  but  unavailable  state  in  a de- 
terministic control  policy.  This  policy  is  called 
the  certainty  equivalence  policy.  In  terms  of  the 
policy  shown  in  Eq.  (2),  the  certainty  equivalence 
policy  would  be  written 

I 1 i 

UCE°°  = VW  * 

However,  since,  as  pointed  out  above,  the  control 


policy  in  general  affects  the  quality  of  the  esti- 
mate for  the  system  state,  it  is  possible  that  in 
order  to  observe  the  system  better,  the  control 


should  be  driven  in  a direction  different  from  the 
one  which  would  be  optimal  if  there  were  no  noise 
contaminating  the  measurements  or  driving  the 
system. 

Note  that  a certainty  equivalence  control  policy,  by 
its  very  nature,  neglects  this  second  "dual"  goal  of 
the  optimal  stochastic  control  policy.  By  definition 
the  control  function  used  in  the  certainty  equiva- 
lence policy,  <Pk(  ),  is  the  optimal  policy  for  the 
deterministic  control  problem  obtained  by  replac- 
ing all  of  the  noise  terms  and  unknown  parameters 
by  the  mean  values.  The  certainty  equivalence 
control  policy  [UCE(k),  k = 1,  N]  thus  spends  all  of 
its  energy  in  trying  to  satisfy  the  first  goal,  keep- 
ing the  state  on  the  desired  trajectory  with  a mini- 
mum of  control  energy.  This  is  done  ignoring  the 
uncertainty  in  the  knowledge  of  the  true  value  of  the 
system  state  and  at  the  expense  of  the  second  goal 
of  learning  more  about  the  true  value  of  the  state. 
Since  the  knowledge  gained  by  "probing"  the  system 
could  enhance  the  accuracy  of  the  state  estimate 
and  thus  allow  more  accurate  control,  it  could 
greatly  aid  in  minimizing  the  overall  control  cost. 
This  discussion  should  point  out  why  the  certainty 
equivalence  policy  is  generally  suboptimal. 

3.  CAUTION  AND  PROBING 

In  an  interesting  series  of  papers  by  Bar-Shalom 
and  Tse*4,  5’6,7'  there  are  discussions  of  the  na- 
ture of  optimal  stochastic  control  policies  and  cer- 
tain approximations  to  such  policies.  They  note 
(similar  to  Feldbaum)  that  in  stochastic  control 
systems  the  optimal  stochastic  control  can  have 
two  effects.  First  it  can  "probe"  the  system  in  or- 
der to  "look  into  the  future"  and  make  the  most  use 
of  present  information  about  what  might  be  learned 
from  later  measurements.  To  calculate  such  a 
policy  in  a feedback  sense  will  require  consider- 
able mathematical  analysis.  The  second  effect 


I 


V 


they  point  out  ia  the  need  for  "caution".  The 
"caution"  term  enters  by  an  apparent  increase  in 
the  cost  of  control  thereby  reducing  the  amount  of 
control  effort  that  can  be  used  in  the  optimal  sys- 
tem. Both  of  these  effects  arise  from  a detailed 
study  of  the  structure  of  the  dynamic  programing 
approach  (the  Principle  of  Optimality)  and  both 
effects  are  due  to  the  second  "dual"  effect. 


4.  TWO  GOALS,  TWO  CONTROL  VECTORS 


Here  a much  simpler  look  at  the  problem  is  dis- 
cussed in  a simple  tutorial  manner.  It  is  hoped 
that  this  discussion  will  lead  to  additional  insight 
on  the  part  of  the  reader. 


Figure  1 


be  plotted  in  appropriate  control  space.  The  con-  calculation  of  the  deterministic  certainty  equiva- 
trol  needed  for  the  first  goal  (follow  the  desired  lence  control  (U  ).  The  "estimation  optimal" 

trajectory  ignoring  probabilistic  considerations)  and  control  IJ  ^ should,  perhaps,  also  be  reduced  to 
the  second  (obtain  the  most  accurate  possible  esti-  account  for  a finite  cost  for  control, 
mate  of  the  system  state)  need  not  be  conflicting. 

On  the  other  hand,  they  could  very  well  be  con- 
flicting. In  a dynamical  system  these  two  control 
effects  could  well  be  in  agreement  at  one  stage 
and  in  opposition  at  the  next. 


It  is  enlightening,  in  certain  cases,  to  think  of  the 
optimal  stochastic  control  at  stage  k in  terms  of 
these  two  control  vectors.  For  example,  if  the 
control  is  incapable  of  affecting  the  estimation  ac- 
curacy, there  is  no  "estimation  optimal"  control 
and  should  not  appear  on  the  diagram  or 

should  be  given  zero  weight.  It  is  just  in  this  case 
that  the  "separation  theorem"  can  be  derived  and 
the  certainty  equivalence  control  is  found  to  be  the 
optimal  stochastic  control. 


The  possibilities  are  indicated  in  Figure  1 for  a 

! 

single  time  k.  These  figures  show  the  possible 
relationship  between  the  control  which  would  be 
used,  ignoring  the  ability  of  the  control  to  affect 
the  estimation  accuracy  (U  ),  and  the  control 

Vs  JL 

chosen  to  give  the  best,  minimum  variance,  esti 
mate  without  regard  to  the  desired  trajectory 
(U_c_)  at  time  k. 


At  another  extreme,  one  might  envision  the  case 
where  the  control  could  affect  the  estimation  ac- 
curacy,  but  could  not  affect  the  basic  deterministic 
there  could  be  a zero  cost  function  itself.  In  this  case,  there  would  be 
irould  represent  the  no  reason  to  improve  the  estimation  accuracy  and 
to  get  minimum  con-  the  optimal  control  policy  would  be  to  use  no  con- 
rol  represented  by  trol  effort.  This  latter  case  shows  that  the  second 

dy  included  in  the  "dual  effect"  is  secondary  to  the  first. 
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It  should  be  noted  from  Figure  1A  that  the  con- 

! 

trols  required  for  each  of  the  dual  effects  do  not 
necessarily  have  to  be  different.  Thus,  in  Figure 
1A  one  would  use  control  vectors  pointing  in  the 
same  direction  for  both  effects.  If  tl.e  length  of 
the  two  vectors  is  the  same  at  each  stage,  the 
control  policy  giving  the  "best  estimation" 
would  be  the  same  as  the  one  ignoring  probablistic 
effects  (U^).  If  uEgT  is  considerably  shorter 

than  U , the  optimal  control  could  oe  expected 
CE  (5) 

to  exhibit  some  "caution".  Or.  the  other  hand, 
if  the  two  control  vectors  were  parallel  as  in  Fig- 
ure 1C  but  was  shorter  than  uEgT»  the  opti- 
mal control  would  be  expected  to  be  stronger  than 

that  demanded  by  U . This  could  be  thought  of 
CE  (5) 

as  additional  "probing".  In  an  extreme  case  of 

"caution"  U_  and  U __  are  actually  in  opposite 
Lbl 

directions  (Figure  IB).  In  such  a case,  the  con- 
trol used  for  one  goal  might  be  the  worst  thing  you 
could  do  for  the  other.  A detailed  study  of  the  dy- 
namics and  stagewise  progression  of  the  system 
would  be  required,  to  obtain  even  an  approxima- 
tion of  the  true  "optimal  stochastic  control”,  rath- 
er than  considerations  of  the  tradeoff  of  these  two 
controls  at  each  stage.  This  is  because  the  con- 
flict might  appear  to  demand  no  action  at  any  giv- 
en stage  but  considerations  of  the  effect  of  the  tot- 

i 

al  control  policy  might  indicate  that  effoTt  should 
be  expended  immediately  to  observe  the  system 
so  that  it  could  be  controlled  more  accurately  at 
later  stages. 

The  possibility  of  U£gT  being  orthogonal  to  UCE 
is  indicated  in  Figure  1C.  While  at  a single  stage, 
a control  which  was  the  vector  sum  of  the  two 

j 

might  make  sense,  the  continued  use  of  such  a 

I 

combination  (if  the  orthogonality  persisted)  could 
drive  the  system  far  from  the  desired  trajectory. 
Thus,  while  the  orthogonal  "probing"  required  by 
U 


EST 


would  affect  the  optimal  control  policy,  any 


ered  later  in  the  trajectory  in  order  to  keep  the 
system  on  the  correct  path. 

From  the  above  discussion  it  can  be  seen  that  these 
aTe  only  two  cases  where  the  "optimal  stochastic 
control  law"  can  easily  be  obtained.  The  first  is 
when  the  control  cannot  affect  the  estimation  accu- 
racy and  the  generally  used  certainty  equivalence 
control  law  is  optimal.  The  second  case  would  be 
one  where  is  coincident  with  at  each 

stage.  In  this  case,  U would  again  be  the  opti- 
mal  control  policy.  Even  if  the  two  controls  are 
only  colinear  and  approximately  the  same  length, 
we  would  expect  U to  drive  the  state  vector  in 
such  a way  that  the  measurements  could  be  used  to 
adequately  estimate  the  state.  Such  estimates 
might  involve  the  use  of  fairly  complex  nonlinear 
filters  but  there  would  be  no  need  to  consider  the 
dual  effect  explicitly. 

Here  we  suggest  that  the  choice  of  the  measure- 
ment device  or  measurement  function  can  lead  to 
the  aligning  of  these  two  vectors  in  Figure  1.  This 
aligning  of  the  two  goals  will  lead  to  what  we  define 
as  a "natural  probing"  of  the  system.  The  design- 
er should  choose  his  measurement  structure  so 
that  it  will  estimate  the  state  in  the  best  manner 
possible  when  the  control  system  is  driving  the  sys- 
tem to  desired  trajectory.  Then  tne  dual  goals 
will  be  naturally  satisfied  without  explicit  consid- 
eration of  the  effect  of  the  control  on  the  estimation 
accuracy.  This  effect  is  discussed  in  terms  of  a 
simple  set  of  examples  below. 

5.  SIMPLE  EXAMPLES 

Consider  the  simple  scalar  stochastic  dynamic 
control  system 


*k+l  = + Uk  + Wk  k'0’1'”*N 


(5) 


"probing"  actually  used  would  have  to  be  consid- 


where the  scalar  control  is  to  be  chosen  in  order 
to  minimise  the  expected  value  of  the  random  cost 
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functional  C. 

J - e{c[uq.  Uj(  ),  ...  U.(  )]} 

N+1  , , 

C - E (a.x.  +b.  ,U.  ,Z) 

j=l  j j j-1  j-1 


(6) 


(7) 


The  noise  driving  the  dynamics  is  taken  to  be  a 
white,  aero-mean  stochastic  process  with  covari- 
ance matrix  Q^.  The  stochastic  process  is 
independent  of  the  a priori  state  x^  which  is  a ran- 
dom variable  with  mean  x^  and  covariance  P^. 

The  relationship  between  the  controls  required  to 
satisfy  the  dual  goals  at  each  stage  changes  with 
the  choice  of  measurement  function.  Consider 
the  following  cases. 

Case  1 

No  measurement  information. 

Case  2 


*k  ’ 

Case  3 


Vk 


+ V, 


% = Vk 

Case  4 


+ v. 


*k  = Vk+vkUk-l 


Case  5 


% “ SICN<Hk*k + V 


In  each  case  v^  is  taken  to  be  a sero  mean  white 
stochastic  process  with  covariance  R^v^  is  also 
independent  of  both  w^  and  x^. 

First  remember  that  if  and  P^  are  sero  we  re- 
duce to  the  deterministic  case  and  x^  is  explicitly 
available.  In  this  case  the  optimal  deterministic 

i 

control  policy  is 
D 


when  written  as  an  equivalent  open  loop  policy. 

The  calculation  of  the  A,  is  well  documented  in  the 
literature. (8) 

* / 

In  Case  1 and  Case  2 the  optimal  stochastic  control 
policy  can  be  explicitly  calculated.  In  Case  1 it  is 
given  by 


U. 


-V^0+?>k-1u0  + ...uk-1) 


and  in  Case  2 the  optimal  stochastic  control  policy 
is  given  as 


V ■ -Vk 


-Vk 


•when  written  as  a feedback  control  policy  or 

i 

D 


I U. 

I * 


-vA> 


+ <pk*1u0  +ok_2u1...  +u, 


k-r 


/k 

Here  in  the  best  linear  estimate  of  the  state 

x^  conditioned  on  all  the  measurement  data  z^,  z^. 
. . . *k  and  is  given  by  the  Kalman  filter.  In  both 
cases  the  control  policy  is  the  same  as  the  optimal 
deterministic  control  policy  with  the  state  replaced 
by  the  best  available  estimate  for  state  at  stage  k. 
These  optimal  stochastic  control  policies  can  be 
obtained  by  solving  two  separate  problems,  the  de- 
terministic control  problem  and  the  state  estima- 
tion problem.  This  fact  is  called  the  Separation 
Principle.  The  principle  results  from  the  fact  that 
the  control  policy  has  no  effect  on  the  estimation 
accuracy.  In  this  case  the  second  control  U in 
Figure  1 is  indeterminant  and  should  be  given  no 
weight. 

Case  3 is  discussed  in  references  (11)  and  (12). 

Due  to  the  nature  of  the  cost  function  U __  will 
always  try  to  drive  the  state  estimate  to  the  origin. 
However,  due  to  the  nature  of  the  measurement  the 
signal  to  noise  ratio  will  be  worst  at  the  origin  and 
will  get  better  as  the  state  moves  from  the  origin. 
Thus  will  point  in  the  opposite  direction 

from  U___.  As  shown  in  reference  (12)  using 
at  the  first  stage  can  be  the  worst  possible 
control  to  use.  This  is  the  situation  indicated  in 
Figure  IB.  , I 

In  Case  4 (discussed  in  more  generality  in  refer- 
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ence  13),  the  use  of  the  control  at  stage  k-1  will 
reduce  the  accuracy  of  the  measurements  at  stage 
k.  Since  the  best  measurements  are  obtained 
when  no  control  is  used,  is  identically  zero. 

The  optimal  control  is  thus  reduced  by  some 
amount  from  U^^,  • As  can  be  seen  from  this 
and  as  shown  in  reference  (13),  the  optimal  sto- 
chastic  control  is  again  of  the  form 


but  the  weighting  matrices  (A^  ) are  not  the  same 
as  in  the  deterministic  case  or  in  Cases  1 and  2. 
The  control  and  estimation  problem  are  again 
separated  but  the  control  is  not  the  certainty  e- 
quivalence  control.  The  use  of  U will  in- 
crease  the  noise  in  the  measurements  and  gener- 
ally effect  the  accuracy  of  the  state  estimates  and 
thereby  greatly  degrade  the  control  performance. 
Here  the  optimal  stochastic  control  exhibits  "cau- 
tion". The  optimal  stochastic  control  is  aligned 
with  hut  reduced  in  length  as  it  would  be  in 

Figure  1A. 

In  Case  5,  discussed  at  some  length  in  reference 
(14),  the  two  control  goals  are  aligned.  The  mea- 
surement function  adds  maximum  information 
about  the  state  when  the  sign  changes  unexpectedly 
or  when  the  "true  state"  is  near  zero.  The  de- 
sired trajectory  in  this  problem  will  also  require 
that  the  control  drive  the  state  to  zero  (regulator 
problem).  Thus  as  shown  in  reference  (14),  the 

performance  obtained  from  the  use  of  U is 

CE 

very  close  to  a known  lower  bound  to  the  perfor- 
mance of  the  optimal  stochastic  control.  Thus  in 
aligning  the  two  control  objectives  we  can  approach 


the  true  optimal  stochastic  control  ("dual  perfor- 
mance") with  much  less  computation  than  required 
to  calculate  the  true  "dual  control"  law. 


6.  CONCLUSIONS 

Here  it  has  been  suggested  that,  when  dealing  with 
dynamic  stochastic  systems  observed  by  noisy 
measurements,  • when  possible,  measurement 
transducers  be  selected  with  an  eye  to  aligning  the 
two  dual  control  goals.  This  could  result  in  im- 
proved control  system  performance  without  re- 
quiring increasingly  complicated  control  laws. 

The  aligning  of  these  two  goals  will  lead  to  an  ap- 
proximate type  of  separation  principle  in  that  the 
control  chosen  for  the  primary  goal  will  tend  to 
automatically  drive  the  system  to  improve  the  ac- 
curacy of  state  estimates. 
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